Feature Weight Optimization for Discourse-Level SMT
نویسندگان
چکیده
We present an approach to feature weight optimization for document-level decoding. This is an essential task for enabling future development of discourse-level statistical machine translation, as it allows easy integration of discourse features in the decoding process. We extend the framework of sentence-level feature weight optimization to the document-level. We show experimentally that we can get competitive and relatively stable results when using a standard set of features, and that this framework also allows us to optimize documentlevel features, which can be used to model discourse phenomena.
منابع مشابه
Semantics, Discourse and Statistical Machine Translation
In the past decade, statistical machine translation (SMT) has been advanced from word-based SMT to phraseand syntax-based SMT. Although this advancement produces significant improvements in BLEU scores, crucial meaning errors and lack of cross-sentence connections at discourse level still hurt the quality of SMT-generated translations. More recently, we have witnessed two active movements in SM...
متن کاملDocent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
We describe Docent, an open-source decoder for statistical machine translation that breaks with the usual sentence-bysentence paradigm and translates complete documents as units. By taking translation to the document level, our decoder can handle feature models with arbitrary discourse-wide dependencies and constitutes an essential infrastructure component in the quest for discourse-aware SMT
متن کاملDocument-Wide Decoding for Phrase-Based Statistical Machine Translation
Independence between sentences is an assumption deeply entrenched in the models and algorithms used for statistical machine translation (SMT), particularly in the popular dynamic programming beam search decoding algorithm. This restriction is an obstacle to research on more sophisticated discourse-level models for SMT. We propose a stochastic local search decoding method for phrase-based SMT, w...
متن کاملImproving Implicit Discourse Relation Recognition Through Feature Set Optimization
We provide a systematic study of previously proposed features for implicit discourse relation identification, identifying new feature combinations that optimize F1-score. The resulting classifiers achieve the best F1-scores to date for the four top-level discourse relation classes of the Penn Discourse Tree Bank: COMPARISON, CONTINGENCY, EXPANSION, and TEMPORAL. We further identify factors for ...
متن کاملManyopt: An Extensible Tool for Mixed, Non-Linear Optimization Through SMT Solving
Optimization of Mixed-Integer Non-Linear Programming (MINLP) supports important decisions in applications such as Chemical Process Engineering. But current solvers have limited ability for deductive reasoning or the use of domain-specific theories, and the management of integrality constraints does not yet exploit automated reasoning tools such as SMT solvers. This seems to limit both scalabili...
متن کامل